Fast Community Detection by Score
نویسنده
چکیده
Consider a network where the nodes split into K different communities. The community labels for the nodes are unknown and it is of major interest to estimate them (i.e., community detection). Degree Corrected Block Model (DCBM) is a popular network model. How to detect communities with the DCBM is an interesting problem, where the main challenge lies in the degree heterogeneity. We propose a new approach to community detection which we call the Spectral Clustering On Ratios-of-Eigenvectors (SCORE). Compared to classical spectral methods, the main innovation is to use the entry-wise ratios between the first leading eigenvector and each of the other leading eigenvectors for clustering. Let X be the adjacency matrix of the network. We first obtain the K leading eigenvectors, say, η̂1, . . . , η̂K , and let R̂ be the n×(K−1) matrix such that R̂(i, k) = η̂k+1(i)/η̂1(i), 1 ≤ i ≤ n, 1 ≤ k ≤ K−1. We then use R̂ for clustering by applying the k-means method. The central surprise is, the effect of degree heterogeneity is largely ancillary, and can be effectively removed by taking entry-wise ratios between η̂k+1 and η̂1, 1 ≤ k ≤ K − 1. The method is successfully applied to the web blogs data and the karate club data, with error rates of 58/1222 and 1/34, respectively. These results are much more satisfactory than those by the classical spectral methods. Also, compared to modularity methods, SCORE is computationally much faster and has smaller error rates. We develop a theoretic framework where we show that under mild conditions, the SCORE stably yields successful community detection. In the core of the analysis is the recent development on Random Matrix Theory (RMT), where the matrix-form Bernstein inequality is especially helpful.
منابع مشابه
Leaders, Followers, and Community Detection
Communities in social networks or graphs are sets of well-connected, overlapping vertices. The effectiveness of a community detection algorithm is determined by accuracy in finding the ground-truth communities and ability to scale with the size of the data. In this work, we provide three contributions. First, we show that a popular measure of accuracy known as the F1 score, which is between 0 a...
متن کاملFast network community detection by SCORE
Consider a network where the nodes split into K different communities. The community labels for the nodes are unknown and it is of major interest to estimate them (i.e., community detection). Degree Corrected Block Model (DCBM) is a popular network model. How to detect communities with the DCBM is an interesting problem, where the main challenge lies in the degree heterogeneity. We propose Spec...
متن کاملFast Food Consumption Status and Its Determinants in Iranian Population During COVID-19 Outbreak
Background and Objectives: Fast-food consumption is associated with obesity and non-communicable diseases, leading to the severity of COVID-19 status. The aim of this study was to investigate fast-food consumption status and its determinants in Iranian population during the epidemic. Materials & Methods: This cross-sectional study was carried out on 891 Iranian adults from most regions of the ...
متن کاملMaterial for “ Fast Community Detection by Score ”
It is seen that • The information of the community labels is contained in the term within the bracket, which depends on {✓(i)} i=1 only through the overall degree intensities k✓(k)k/k✓k. • The diagonal matrix ⇥ does not contain any information of the community labels. • Therefore, {✓(i)} i=1 are almost nuisance parameters, the e↵ect of which can be removed by many scaling invariant mappings, to...
متن کاملFast Detection of Community Structures using Graph Traversal in Social Networks
Finding community structures in social networks is considered to be a challenging task as many of the proposed algorithms are computationally expensive and does not scale too well for large graphs. Most of the community detection algorithms proposed till date are unsuitable for applications that would require detection of communities in real-time, especially for massive networks. The Louvain me...
متن کاملFast Community Detection by Score By
Consider a network where the nodes split into K different communities. The community labels for the nodes are unknown and it is of major interest to estimate them (i.e., community detection). Degree Corrected Block Model (DCBM) is a popular network model. How to detect communities with the DCBM is an interesting problem, where the main challenge lies in the degree heterogeneity. We propose a ne...
متن کامل